InstanceFormer: An Online Video Instance Segmentation Framework

نویسندگان

چکیده

Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage efficient VIS framework named InstanceFormer, which is especially suitable for long challenging We three novel components to model short-term long-term dependency temporal coherence. First, propagate representation, location, semantic information of prior instances changes. Second, memory cross-attention decoder, allows network look into earlier within certain window. Finally, employ contrastive loss impose coherence representation an across all frames. Memory are particularly beneficial long-range modeling, including scenarios like occlusion. The proposed InstanceFormer outperforms previous benchmark methods large margin multiple datasets. Most importantly, surpasses datasets YouTube-VIS-2021 OVIS. Code available at https://github.com/rajatkoner08/InstanceFormer.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MaskRNN: Instance Level Video Object Segmentation

Instance level video object segmentation is an important technique for video editing and compression. To capture the temporal coherence, in this paper, we develop MaskRNN, a recurrent neural net approach which fuses in each frame the output of two deep nets for each object instance — a binary segmentation net providing a mask and a localization net providing a bounding box. Due to the recurrent...

متن کامل

Instance Embedding Transfer to Unsupervised Video Object Segmentation

We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames, which a...

متن کامل

An Integrated framework for Robust and Fast Automatic Video Segmentation

In this paper, we extend previous work on video segmentation [1,2] and propose a novel integrated module-based framework for real-time applications which require automatic, precise, and fast-based features. The framework consists of five main modules: sprite generator, key frame identifier, change detector, video object plane (VOP) extractor and post-processor. A universal approach for sprite-o...

متن کامل

An Online Learning Framework for Sports Video View Classification

Sports videos have special characteristics such as well-defined video structure, specialized sports syntax, and some canonical view types. In this paper, we proposed an online learning framework for sports video structure analysis, using baseball as an example. This framework, in which only a very small number of pre-labeled training samples are required at initial stage, employs an optimal loc...

متن کامل

Shape-aware Instance Segmentation

We address the problem of instance-level semantic segmentation, which aims at jointly detecting, segmenting and classifying every individual object in an image. In this context, existing methods typically propose candidate objects, usually as bounding boxes, and directly predict a binary mask within each such proposal. As a consequence, they cannot recover from errors in the object candidate ge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i1.25201